sd: sync to master-593-3d6064b#2175
Conversation
c1de647 to
9f2f7a4
Compare
|
Copied from #2155:
We first pick the device name with this: __STATIC_INLINE__ std::string get_default_backend_name() {
ggml_backend_load_all_once();
// should pick the same backend as ggml_backend_init_best
ggml_backend_dev_t dev = ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_GPU);
dev = dev ? dev : ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_IGPU);
dev = dev ? dev : ggml_backend_dev_by_type(GGML_BACKEND_DEVICE_TYPE_CPU);
if (dev == nullptr) {
return "";
}
return ggml_backend_dev_name(dev);
}then initialize it by name.
It still works like this: only two types of backend will be exposed by ggml at a time (Vulkan and CPU, ROCm and CPU, etc), so it'll prefer the first discrete GPU, then the first iGPU; and use it for everything (the flags --x-on-cpu still work the same way as before).
In the sense of, say, Vulkan0 versus Vulkan1, I think the selection has changed: I have a lot of scripts to convince everybody that my dGPU is at a specific slot (it often switches places with the iGPU at boot 😕), and sd.cpp now seems to correctly pick the dGPU regardless of them. But note: even if the previous algorithm was just "pick the first device", technically it was already the C++ layer choosing it (unless forced with We will likely get more sophisticated selection soon (like "Vulkan1 for the diffusion, Vulkan0 for the VAE, CPU for the conditioner"), but we are not there yet: it's still the same backend for everything, except for --x-on-cpu. I made a hackish adaptation for
This way, if the Python layer decides it wants to force a device, let C++ pick whatever, or even support fancy aliases like "iGPU", it can. In fact, we could even add support for placing models on different devices before upstream: the diff would be pretty simple at this point. |
|
@LostRuins , I've just noticed this: bool load_model(const load_model_inputs inputs)
{
(...)
std::string vulkan_info_raw = inputs.vulkan_info;
std::string vulkan_info_str = "";
for (size_t i = 0; i < vulkan_info_raw.length(); ++i) {
vulkan_info_str += vulkan_info_raw[i];
if (i < vulkan_info_raw.length() - 1) {
vulkan_info_str += ",";
}
}
const char* existingenv = getenv("GGML_VK_VISIBLE_DEVICES");
if(!existingenv && vulkan_info_str!="")
{
vulkandeviceenv = "GGML_VK_VISIBLE_DEVICES="+vulkan_info_str;
putenv((char*)vulkandeviceenv.c_str());
}
|
|
And as far as I can tell, apart from I've added device reporting to the My plan is changing the "on_gpu" booleans at the C++ interface to receive the device index instead. So we'd have integers
I believe this will be enough for any behavior we need:
|
|
If you search |
1d5c5da to
c002d36
Compare
|
I've simplified the About the |
|
Gave it a try: #2179 . I've notice the |
Since master-592-b8079e2, no sd.cpp source depends on the ggml backend build anymore.
c002d36 to
3288741
Compare
|
turns out its shadowing caused by I can sort of work around it by making a separate utils header safe for sdcpp and put the llm related functions elsewhere. |
|
Not the best solution but i did this: 950676f ideally sd.cpp should still use different function names but since its probably bad to change upstream code unless we have to, this is the best we can do probably... |
Yeah... As I understand it, it was intentional, to be able to load even the cpu backend as optional. But that should be changed in ggml itself.
Looks good (I wouldn't even create a separate llmutils.cpp, but maybe it also helps keeping it organized). What's the reason we do it this way, including the source files in the sdtype_adapter itself instead of building them separately? A normal split build would mostly eliminate these kind of issues.
I'll bring this issue upstream. Code-wise, I don't think that header needs to be globally exposed to work; and we could also mass-rename the symbols to get them out of the way. Unfortunately, this doesn't affect the public interface, so it may be tricky to argue about why is it needed. |
mainly because its a huge pain to juggle the individual object files in the makefile, since we build for many different targets. Each compilation unit except the final one can only consist of a single .cpp file, so you end up having many object files catered to each target (which gets worse if there's build specific flags, then you have CMake solves that problem (it's a lot easier with cmakelists) but adds a new problem where cmake is a dependency, and also does not support easily building for multiple targets at once. |
master-592-b8079e2removes all build-time backend dependencies on sd.cpp sources.The backend change is very incompatible with many ggml includes: I've had to remove
util.hfromsdtypes_adapter.cppbecause it triggered weird errors onllama.h. A better solution could be splittingutil.hinto two headers, but I didn't want to change stuff outside otherarch/sdcpp in the middle of this PR. I've also had to adapt ourmain_gpusupport, but didn't test it very well.On the other hand, since sd.cpp became backend-agnostic,
sdtypes_adapter.cppnow only needs to be built once for all backends. Both Vulkan and ROCm seem to be running fine with the change.After #2155 .Edit: moving the discussion here to make it easier to follow.